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Two-player quantitative zero-sum games provide a natural framework to synthesize controllers with 
performance guarantees for reactive systems within an uncontrollable environment. Classical settings 
include mean-payoff games, where the objective is to optimize the long-run average gain per action, 
and energy games, where the system has to avoid running out of energy. 

We study average-energy games, where the goal is to optimize the long-run average of the 
accumulated energy. We show that this objective arises naturally in several applications, and that it 
yields interesting connections with previous concepts in the literature. We prove that deciding the 
winner in such games is in NP n coNP and at least as hard as solving mean-payoff games, and we 
establish that memoryless strategies suffice to win. We also consider the case where the system has to 
minimize the average-energy while maintaining the accumulated energy within predehned bounds at 
all times: this corresponds to operating with a hnite-capacity storage for energy. We give results for 
one-player and two-player games, and establish complexity bounds and memory requirements. 


1 Introduction 

Quantitative games. Game-theoretic formulations are a standard tool for the synthesis of provahly- 
correct controllers for reactive systems [22]. We consider two-player (system vs. environment) turn-hased 
games played on finite graphs. Vertices of the graph are called states and partitioned into states of player 1 
and states of player 2. The game is played hy moving a pehhle from state to state, along edges in the 
graph, and starting from a given initial state. Whenever the pehhle is on a state belonging to player i, 
player i decides where to move the pehhle next, according to his strategy. The infinite path followed hy 
the pehhle is called a play, it represents one possible behavior of the system. A winning objective encodes 
acceptable behaviors of the system and can be seen as a set of winning plays. The goal of player 1 is to 
ensure that the outcome of the game will be a winning play, whatever the strategy played by his adversary. 

To reason about resource constraints and the performance of strategies, quantitative games have 
been considered in the literature. See for example [10, 3, 29], or [30] for an overview. Those games 
are played on weighted graphs, where edges are fitted with integer weights modeling rewards or costs. 
The performance of a play is evaluated via a payojffunction that maps it to the numerical domain. The 
objective of player 1 is then to ensure a sufficient payoff with regard to a given threshold value. Seminal 
classes of quantitative games include mean-payoff {MP), total-payoff {TP) and energy games {EG). In MP 
games [15, 33, 24], player 1 has to optimize his long-run average gain per edge taken whereas, in TP 
games [20, 19], player 1 has to optimize his long-run sum of weights. Energy games [10, 5, 23] model 
safety-like properties: the goal is to ensure that the running sum of weights never drops below zero 
and/or that it never exceeds a given upper bound G G W. All three classes share common properties. 
First, MP games, TP games, and EG games with only a lower bound {EGf) are memoryless determined 
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(given an initial state, either player 1 has a strategy to win, or player 2 has one, and in both cases no 
memory is required to win). Second, deciding the winner for those games is in NP n coNP and no 
polynomial algorithm is known despite many efforts (e.g., [8, 12]). Energy games with both lower and 
upper bounds (EGlu) are more complex: they are EXPTIME-complete and winning requires memory in 
general [5]. 

While those classes are well-known, it is sometimes necessary to go beyond them to accurately model 
practical applications. For example, multi-dimensional games and conjunctions with a parity objective 
model trade-offs between different quantitative aspects [11, 14, 32]. Similarly, window objectives address 
the need for strategies ensuring good quantitative behaviors within reasonable time frames [12]. 

Average-energy games. We study the average-energy (AE) payoff function: in AE games, the goal of 
player 1 is to optimize the long-run average accumulated energy over a play. We introduce this objective 
to formalize the specification desired in a practical application [9], which we detail in the following as a 
motivating example. Interestingly, it turns out that this payoff first appeared long ago [31], but it was not 
subject to a systematic study until very recently: see related work for more discussion. 

In addition to being meaningful w.r.t. practical applications, AE games also have theoretical interest. 
In [13], Chatterjee and Prabhu define fhe average debit-sum level objective, which can be seen as a 
variafion of fhe average-energy where fhe accumulafed energy is faken fo be zero in any poinf where if is 
acfually positive (hence, if focuses on fhe average debf). They use fhe corresponding games fo compufe 
fhe values of quanfifalive timed simulation functions. In particular, fhey provide a pseudo-polynomial- 
time algorifhm fo solve fhose games, buf fhe complexify of deciding fhe winner as well as fhe memory 
requiremenfs are open. Here, we solve fhose questions for fhe very similar average-energy objecfive. 

Motivating example. Our example is a simplified version of fhe indusfrial applicafion sfudied by Cassez 
et al. [9]. Consider a machine fhaf consumes oil, sfored in a connected accumulator. We wanf to synfhesize 
an appropriate confroller fo operafe fhe oil pump fhaf fills fhe accumulafor, and by fhe effecf of pressure, 
fhaf releases oil from fhe accumulator into fhe machine wifh a (lime-varying) rate according to desired 
production. In order to ensure safely, fhe oil level in fhe accumulator should be mainlained al all times 
belween a minimal and a maximal level. This pari of fhe specification can be encoded as an energy 
objecfive wifh bolh lower and upper bounds (EGlu)- At the same time, fhe more oil (Ihus pressure) in fhe 
accumulator, fhe faster fhe whole apparalus wears oul. Hence, an ideal confroller should minimize fhe 
average level of oil in fhe long run. This desire can be formalized Ihrough fhe average-energy payoff (AE). 
Overall, fhe specificalion is Ihus fo minimize fhe average-energy under fhe slrong energy conslrainls: we 
denole fhe corresponding objecfive by AElu- 

Contributions. Our main results are summarized in Table 1 . 

A) We establish that the average-energy objective can be seen as a refinement of total-payoff, in the 
same sense as total-payoff is seen as a refinement of mean-payoff [19]: it allows to distinguish strategies 
yielding identical mean-payoff and total-payoff. 

B) We show that deciding the winner in two-player AE games is in NP n coNP whereas it is in P 
for one-player games. In both cases, memoryless strategies suffice (Thm. 5). Those complexities match 
the state-of-the-art for MP and TP games [33, 24, 19, 8]. Furthermore we prove that AE games are at 
least as hard as mean-payoff games (Thm. 7). Therefore, the NP n coNP-membership can be considered 
optimal w.r.t. our knowledge of MP games. Technically, the crux of our approach is as follows. First, 
we show that memoryless strategies suffice in one-player AE games (Thm. 3): this requires to prove 
important properties of the AE payoff as classical sufficient criteria for memoryless determinacy present 
in the literature fail to apply directly. Second, we establish a polynomial-time algorithm for the one-player 
case: it exploits the structure of winning strategies and mixes graph techniques with local linear program 
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Game objective 

1-player 

2-player 

memory 

MP 

in P [26] 

inNPncoNP[33] 

memoryless [15] 

TP 

in P [17] 

inNPncoNP[19] 

memoryless [20] 

EGl 

in P [5] 

inNPncoNP[10, 5] 

memoryless [10] 

EGlu 

PSPACE-complete [16] 

EXPTI ME-complete [5] 

pseudo-polynomial 

AE 

in P 

in NP n coNP 

memory less 

AElu, polynomial U 

in P 

in NP n coNP 

polynomial 

AElu, arbitrary U 

in EXPTIME / PSPACE-hard 

EXPTI M E-complete 

pseudo-polynomial 

AEl 

EXPTIME-easy / NP-hard 

open / EXPTIM E-hard 

open (> pseudo-p.) 


Table 1: Complexity of deciding the winner and memory requirements for quantitative games: MP stands 
for mean-payoff, TP for total-payoff, EGl (resp. EGlu) for lower-bounded (resp. lower- and upper- 
bounded) energy, AE for average-energy, and AEl (resp. AEm) for average-energy under a lower bound 
(resp. and upper bound 17 G IN) on the energy. Results without reference are proved in this paper. 


solving (Thm. 4). Finally, we lift memoryless determinacy to the two-player case using results by Gimbert 
and Zielonka [21] and obtain the NP n coNP-membership as a corollary (Thm. 6). 

C) We establish an EXPTIME algorithm to solve two-player AE games with lower- and upper- 
bounded energy (AEm) with an arbitrary upper bound 17 G IN (Thm. 8). It relies on a reduction of 
the AEiu game to a pseudo-polynomially larger AE game where the energy constraints are encoded 
in the graph structure. Applying straightforwardly the AE algorithm on this game would only give us 
NEXPTIME n coNEXPTIM E-membership, hence we avoid this blowup by further reducing the problem 
to a particular MP game and applying a pseudo-polynomial algorithm, with some care to ensure that 
overall the algorithm only requires pseudo-polynomial time in the original AEnj game. Since the simpler 
EGlu games (i.e., AElu with a trivial AS constraint) are already EXPTIME-hard [5], the EXPTIME- 
membership result is optimal. We also prove that pseudo-polynomial memory is both sufficient and in 
general necessary to win in AElu games, for both players (Thm. 9). Whether one-player AElu games 
belong to PS PACE is an open question. For polynomial (in the size of the game graph) values of the 
upper bound U —or if it is given in unary—the complexity of the two-player AElu problem collapses to 
NP n coNP with the same approach, and polynomial memory suffices for both players. 

D) We provide partial answers for the AEl objective —AE under a lower bound constraint on energy 
but no upper bound. We provide an EXPTIM E algorithm for the one-player case, by reducing the problem 
to an AElu game with a sufficiently large upper bound. That is, we prove that if the player can win for the 
AEl objective, then he can do so without ever increasing its energy above a well-chosen bound. We also 
prove the AEl problem to be at least NP-hard in one-player games and EXPTIM E-hard in two-player 
games (Lem. 12) via reductions from the subset-sum problem and countdown games respectively. Finally, 
we show that memory is required for both players in two-player AF"/, games (Lem. 13), and that pseudo¬ 
polynomial memory is both sufficient and necessary in the one-player case (Thm. 10). The decidability 
status of two-player AEl games remains open as we only provide a correct but incomplete incremental 
algorithm (Lem. 11). We conjecture that the two-player AEl problem is decidable and sketch a potential 
approach to solve it. We highlight the key remaining questions and discuss some connections with related 
models that are known to be difficult. 

Observe that in many applications, the energy must be stocked in a finite-capacity storage for which 
an upper bound is provided. Hence, the model of choice in this case is AElu- 
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Related work. The average-energy payoff—Eq. (1)— appeared in a paper by Thuijsman and Vrieze in 
the late eighties [31], under the name total-reward. This definition is different from the classical total- 
payojf —see Sect. 2 —commonly studied in the formal methods community (see for example [20, 19]), 
which, despite that, has been referred in many papers as either total-payoff or total-reward equivalently. 
We will see in this paper that both definitions are indeed different and exhibit different behaviors. 

Maybe due to this confusion, the payoff of Eq. (1) —which we call average-energy thus avoiding 
misunderstandings—was not studied extensively until recently. Nothing was known about memoryless 
determinacy and complexity of deciding tbe winner. Independently to our work. Boros et al. recently 
studied the same payoff (under the name total-payoff). In [4], they study Markov decision processes and 
stochastic games with the payoff of Eq. (I) and solve both questions. Their results overlap with ours 
for AE games (Table 1). Let us first mention that our results were obtained independently. Second, and 
most importantly, our approach and techniques are different, and we believe our take on the problem 
yields some interest for our community. Indeed, the algorithm of Boros et al. entirely relies on linear 
programming in the one-player case, and resorts to approximation by discounted games in the two-player 
one. Our techniques are arguably more constructive and based on inherent properties of the payoff. In 
that sense, it is closer to what is usually deemed important in our field. Eor example, we provide an 
extensive comparison with classical payoffs. We base our proof of memoryless determinacy on operational 
understanding of the AE which is crucial in order to formalize proper specifications. Our technique then 
benefits from seminal works [21] to bypass the reduction to discounted games and obtain a direct proof, 
thanks to our more constructive approach. Lastly, while [4] considers the AE problem in the stochastic 
context, we focus on the deterministic one but consider multi-criteria extensions by adding bounds on the 
energy {AEm and AE^ games). Those extensions are completely new, exhibit theoretical interest and are 
adequate for practical applications in constrained energy systems, as witnessed by the case study of [9]. 

Recent work of Brazdil et al. [7] considers the optimization of a payoff under energy constraint. They 
study mean-payoff in consumption systems, i.e., simplified one-player energy games where all edges 
consume energy but some states can atomically produce a reload of the energy up to the allowed capacity. 

Lull details and proofs of the results presented here can be found in the extended paper [6] . 


2 Preliminaries 

Graph games. We consider turn-based games played on graphs between two players denoted by SA\ 
and SAi- A game is a tuple G = (Si,S 2 ,E,w) where (i) and S 2 are disjoint finite sets of states belonging 
to and with S = Si tt) S 2 , (ii) E C S x S is a finite set of edges, and (iii) w: E —)■ Z is an integer 
weight function. Given edge (i'i,i' 2 ) £ E, we write w(5i,52) as a shortcut for w((5i,52))- We denote by W 
the largest absolute weight assigned by function w. A game is called 1-player if Si = 0 or S 2 = 0. 

A play from an initial state ^init G S is an infinite sequence 7i = ^o^'i ...Sn-.. such that sq = ^init and for 
all / > 0 we have ( 5 ,, 5,_|_i) G E. The (finite) prefix of n up to position n gives the sequence 7i{n) = i'O'^'i ...Sn, 
the last element Sn is denoted last{7i{n)). The set of all plays in G is denoted by Plays{G) and the set 
of all prefixes is denoted by Prefs{G). We say that a prefix p G Prefs{G) belongs to i G {1,2}, if 
last{p) G Si. The set of prefixes that belong to is denoted by PrefsfG). The classical concatenation 
between prefixes (resp. prefix and play) is denoted by the • operator. The length of a non-empty prefix 
p = SQ.. .Sn Is defined as the number of edges and denoted by |p| =n. 

Payoffs of plays. Given a play n = ^o^i ...Sn...'we define 

• its energy level at position n as EL{n{n)) = w(5(,5/+i); 

• its mean-payoff as MP{n) = limsup„^^ >v(5i,5,+i) = limsup„^^ \EL{Tt{n))\ 
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• its total-payoff as TP{n) = w{si,Si+i) = limsup„_^„£'L(7r(?i)); 

• and its average-energy as 

_ 1 " \ 1 " 

^^(Tr) = limsup-^ =limsup-^£'L(7r(/)). (1) 

n—>oo n \j=0 J fi—>°° ^ i—[ 

We will sometimes consider those measures defined with liminf instead of limsup, in which case we 
write MP , TP and AE respectively. Finally, we also consider those measures over prefixes: we nafurally 
define them by dropping the limsup„_^^ operator and taking n = \p\ for a prefix p G Prefs{G). In this 
case, we simply write MP{p), TP{p) and A£'(p) to denote the fact that we consideryin/te sequences. 

Strategies. A strategy for / G {1,2}, is a function a,-: PrefsfG) —)■ S such that for all p GPrefsfG) 
we have {last{p),Oi{p)) G £. A strategy a,- for is finite-memory if it can be encoded by a deterministic 
finite-state Moore machine. A strategy is memoryless if it does not depend on the history but only on the 
current state of the game. We denote by LfG), the sets of strategies for player We drop G when the 
context is clear. 

A play 71 = 50‘5 'i ... is consistent with a strategy a,- of if, for all n > 0 where last{7i{n)) G Si, we 
have Oi{7i{n)) = Given an initial state ^init £ S and strategies ai and 02 for the two players, we 
denote by Outcome{s\„\i, 0 \ , O 2 ) the unique play that starts in ^init and is consistent with both 0 \ and 02 - 
When fixing the strategy of only l3^i, we denote the set of consistent outcomes by Outcomes{s\„\t, a,). 

Objectives. An objective in G is a set W C Plays{G) that is declared winning for Given a game G, 
an initial state ^init, and an objective 'W, a strategy 0\ G Ti is winning for if for all strategy 02 G £ 2 , 
we have that Outcome{s\„\i, 0\ , O 2 ) G W. Symmetrically, a strategy 02 G £2 is winning for 1^2 if for all 
strategy ai G Ti, we have that Outcome{s\n\t, <7i, < 72 ) 0 ■ That is, we consider zero-sum games. 

We consider the following objectives and combinations of those objectives. 

• Given an initial energy level Cinit £ IN, the lower-bounded energy {EGifi objective Energyi{c\„\i) = 
{tt G Plays{G) | Vn > 0, C]n\t + EL{7i{n)) > 0} requires non-negative energy at all times. 

• Given an upper bound G G IN and an initial energy level Cinit £ IN, the lower- and upper-bounded 

energy (GGlc/) objective Cinit) = {ti G Plays(G) | Vn>0, c\„\t+EL{7t{n)) G [0,G]} 

requires that the energy always remains non-negative and below the upper bound G along a play. 

• Given a threshold t G Q, the mean-payoff (MP) objective MeanPayofff ) = {tt G Plays{G) \ 
MP{71) < f} requires that the mean-payoff is at most t. 

• Given a threshold f G Z, the total-payoff (TP) objective TotalPayoff {t) = {tt G Plays{G) \ TP(7 l) < 
t] requires that the total-payoff is at most t. 

• Given a threshold f G Q, the average-energy {AE) objective AvgEnergyf) = {tt G Plays{G) \ 
AE{7 i) < f} requires that the average-energy is at most t. 

For the MP, TP and AE objectives, note that aims to minimize the payoff value while t3^2 tries to 
maximize it. The reversed convention is also often used in the literature but both are equivalent. For our 
motivating example, seeing as a minimizer is more natural. Note that we define the objectives using 
the limsup variants of MP, TP and AE, but similar results are obtained for the liminf variants. 

Decision problem. Given a game G, an initial state ^init G S, and an objective W C Plays{G) as defined 
above, the associated decision problem is to decide if has a winning strategy for this objective. 

We recall classical results in Table 1. Memoryless strategies suffice for bofh players for EGl [10, 5], 
MP [15] and TP [17, 20] objectives. Since all associated problems can be solved in polynomial time 
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for 1-player games, it follows that the 2-player decision problem is in NPncoNP for those three 
ohjectives [5, 33, 19]. For the EGlu ohjective, memory is in general needed and the associated decision 
problem is EXPTIME-complete [5] (PSPACE-complete for one-player games [16]). 

Game values. Given a game with an objective W G {MeanPayoff ,TotalPayojf ,AvgEnergy} and an 
initial state ^init, we refer to the value from ^init as v = inf{t G Q | 3ai G Zi, Outcomes{s\„\i, 0\)G'W (t)}. 
For both MP and PP objectives, it is known that the value can be achieved by an optimal memoryless 
strategy; for the AE objective it follows from our results (Thm. 5). 

3 Average-Energy 

In this section, we consider the problem of ensuring a sufficiently low average-energy. 

Problem 1 {AE). Given a game G, an initial state ^init, and a threshold t G Q, decide if has a winning 
strategy 0\ G E\for the objective AvgEnergyf). 

3.1 Relation with classical objectives 

Several links between EGl, MP and TP objectives can be established. Intuitively, PP\ can only ensure a 
lower bound on energy if he can prevent from enforcing strictly-negative cycles (otherwise the initial 
energy is eventually exhausted). This is the case if and only if PP\ can ensure a non-negative mean-payoff 
in G (here, he wants to maximize the MP), and if this is the case, can prevent the running sum of 
weights from ever going too far beyond zero along a play, hence granting a lower bound on total-payoff. 

The TP objective is sometimes seen as a refinement of MP for the case where —as a minimizer— 
can ensure MP equal to zero but not lower, i.e., the MP game has value zero [19]. Indeed, one may 
use the TP to further discriminate between strategies that guarantee MP zero. In the same philosophy, 
the average-energy can help in distinguishing strategies that yield identical total-payoffs. See Fig. 1. 
The AE values in both examples can be computed easily using the upcoming technical lemmas (Sect. 3.2). 

In these examples, the average-energy is clearly comprised between the infimum and supremum 
total-payoffs. This remains true for any play. In particular, if the mean-payoff value from a state is not 
zero, its total-payoff value is infinite and the following holds: either can force AE equal to —oo or £^2 
can force AE equal to -foo. 

3.2 Useful properties of the average-energy 

Classical sufficient criteria. Various sufficient criteria—or connected approaches—to deduce mem¬ 
oryless determinacy appear in the literature [15, 2, 1, 20, 27]. Unfortunately, they cannot be applied 
straight out of the box to the AE payoff. Intuitively, a common requirement is for winning objectives to be 
closed under cyclic permutation and under concatenation. Without further assumptions, the AE objective 
satisfies neither. Indeed, consider cycles represented by sequences of weights = {—!}, ^2 = {1} and 
-^3 = {1, -2}. We see that AE{ffx ^ 2 ) = ( -1 + 0)/2 = -1/2 < AF(^ 2 ^i) = (1 - 0)/2 = 1 /2, hence AE 
is not closed under cyclic permutations. Intuitively, the order in which the weights are seen does matter, in 
contrast to most classical payoffs. For concatenation, see that A£'(‘^ 3 ) = 0 while A£'(‘^ 3 ^ 3 ) = —1/2 < 0. 
Here the intuition is that the overall AE is impacted by the energy of the first cycle which is strictly 
negative (—1). In a sense, the AE of a cycle can only be maintained through repetition if this cycle is 
neutral with regard to the total energy level, i.e., if it has energy level zero: we will formalize this intuition 
in Lem. 2. 
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Fig. 1: Both plays have identical mean-payoff and total-payoff: MP{n\) = MP (n\) = MP{ji 2 ) = 
MP {%')) = 0, TP{ni) = TP{n 2 ) = 5, and TP{ni) = TP{% 2 ) = 1- But play tti has a lower average-energy: 
AE{ni) =M(7ri) = 3 <A£(7r2) =AE{n 2 ) = 11/3. 


Extraction of prefixes. We establish two useful properties of the average-energy that help us to prove 
memoryless determinacy. The following lemma descrihes the impact of adding a finite prefix fo an infinife 
play: it will help us in decomposing plays when needed. 

Lemma 1. [Average-energy prefix] Let p G Prefs{G), n G Plays{G). Then, AE{p ■ n) = EL{p) +AE{n). 
The same equality holds for AE. 

Extraction of a best cycle. The next lemma is crucial to prove that memoryless strategies suffice: under 
well-chosen conditions, one can always select a best cycle in a play—hence, there is no interest in mixing 
different cycles and no use for memory. It holds only for sequences of cycles that have energy level zero: 
since they do not change the energy, they do not modify the AE of the following suffix of play, and one 
can decompose fhe AE as a weighfed average over zero cycles. The concafenafion of cycles '^a = ss'.. .s 
and 'lob = ss".. .s isto be understood asloa-'^^b = ss'.. .ss".. .s. 

Lemma 2. [Repeated zero cycles of bounded length [ Let , '^ 2 , *^ 3 , • • • be an infinite sequence of cycles 
loi G Prefs{G) such that (i) Tt ='loi ■ [02 ■ '^2 ■ ■ ■ G Plays{G), (ii) V/ > 1, EL{[oi) = 0 and (Hi) G ]N>o 
such that\/i > I, \%\ < t Then the following properties hold. 

1. The average-energy of Tt is the weighted average of the average-energies of the cycles: 


AE{n) = limsup 

yoo 


' LU\%\-AE{^iy 
. Llil^i - 


( 2 ) 


2. Eor any cycle ^ G Prefs(G) such that ELfirfi) = 0, we have that AE^^^^) = AE^^). 

3. Repeating the best cycle gives the lowest AE: minz-^^^AEytoi) = inf,giN^Q A£'(('^)®) < AE(n). 
Similar properties hold for AE. 


3.3 One-player games 

We assume that the unique player is , hence that S 2 = 0. The proofs are similar for the case where 
all states belong to 3^2 (i-e-. 5'i = 0)- Similarly, we present our results for the AE variant, but they carry 
over to the AE one. Actually, since we show that we can restrict ourselves to memoryless strategies, all 
consistent outcomes will be periodic and thus both variants will be equal over those outcomes. 
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(a) Original game. 



(b) Expanded graph for k = 2. 


Fig. 2: The best cycle ^s ,2 is computed by looking for a path from {s,2) to (i',0) with sum zero in the first 
dimension (zero cycle) and minimal sum in the second dimension (minimal AE). Here, the cycle via s' is 
clearly better, with AE equal to —1/2 in contrast to 1 /2 via s”. 


Memoryless determinacy. Intuitively, we use Lem. 1 and Lem. 2 to transform any arbitrary path in a 
simple lasso path, repeating a unique simple cycle, and yielding an at least as good AE, thus proving that 
any threshold achievable with memory can also be achieved without it. 

Theorem 3. Memoryless strategies are sufficient to win one-player AE games. 

Polynomial-time algorithm. We know the form of optimal memoryless strategies: an optimal lasso 
path 71 = p ■ w.r.t. the AE. We establish a polynomial-time algorithm to solve one-player AF" games. 

The crux is computing, for each state s, the best—w.r.t. the AE—zero cycle starting and ending 
in s (if any). This is achieved through linear programming (LP) over expanded graphs. For each state s 
and length k G {1,..., |S|}, we compute the best cycle 'lifsy by considering a graph (Fig. 2) that models 
all cycles of length k from s and that uses k-\-\ levels and two-dimensional weights on edges of the 
form (c, I ■ c) where c is the weight in the original game and I £ {k,k—is the level of the edge. 
In the LP, we look for cycles ^s,k of length k on 5 such that (a) the sum of weights in the first dimension 
is zero (thus is a zero cycle), and (b) the sum in the second one is minimal. Fortunately, this sum is 
exactly equal to AE{‘^) ■ k thanks to the I factors used in the weights of the expanded graph. Hence, we 
obtain the optimal cycle (in polynomial time). Doing this |S| times for each state s, we obtain for 
each of them the optimal cycle hps (if one zero cycle exists). Then, by Lem. 1, it remains to compute the 
least EL with which each state s can be reached using classical graph techniques (e.g., Bellman-Ford), 
and to pick the optimal combination to obtain an optimal memoryless strategy, in polynomial time. 

Theorem 4. The AE problem for one-player games is in P. 


3.4 Two-player games 

Memoryless determinacy. We now prove that memoryless strategies still suffice in two-player games. 
As discussed in Sect. 3.2, classical criteria do not apply. There is, however, one result that proves par¬ 
ticularly useful. Consider any payoff function such that memoryless strategies suffice for both one-player 
versions (Si = 0, resp. S 2 = 0). In [21, Cor. 7], Gimberf and Zielonka esfablish fhaf memoryless sfrafegies 
also suffice in two-player games with the same payoff. Thanks to Thm. 3, this entails the next theorem. 

Theorem 5, Average-energy games are determined and both players have memoryless optimal strategies. 

Solving average-energy games. By Thm. 5, one can guess an optimal memoryless strategy for ^2 and 
solve the remaining one-player game for ^\, in polynomial time (by Thm. 4). The converse is also true: 
one can guess the strategy of and solve the remaining game where Si = 0 in polynomial time. 

Theorem 6 . The AE problem for two-player games is in N P n coN P. 
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2 12345678 12345 12345 

(a) One-player AS^j/game. (b) Play TTi = (acacacab)'‘’. (c) Play 7r2 = (aacab)“. (d) Play Ttj = (acaab)‘‘’ . 


Fig. 3: Example of a one-player AEli/ game ([/ = 3) and the evolution of energy under different strategies 
that maintain it within [0, 3] at all times. The minimal average-energy is obtained with play 713 : alternating 
in order between the +1, +2 and —3 cycles. 


We prove that MP games can be encoded into AE ones in polynomial time. The former are known to 
be in N P n coNP but whether they belong to P is a long-standing open question (e.g., [33, 24, 8 , 12]). 
Flence, w.r.t. current knowledge, the NP n coNP-membership of the AE problem can be considered 
optimal. The key of the construction is to double each edge of the original game and modify the weight 
function such that each pair of successive edges corresponding to such a doubled edge now has a total 
energy level of zero, and an average-energy that is exactly equal to the weight of the original edge. Then 
we apply decomposition techniques as in Lem. 2 to establish the equivalence. 

Theorem 7. Mean-payoff games can be reduced to average-energy games in polynomial time. 


4 Average-Energy with Lower- and Upper-Bounded Energy 

We extend the AE framework with constraints on the running energy level of the system. Such constraints 
are natural in many applications where the energy capacity is bounded (e.g., fuel tank, battery charge). 
We first study the case where the energy is subject to both a lower bound (here, zero) and an upper bound 
{U G M). We study the problem for the fixed initial energy level Cinit •= 0. 

Problem 2 (AEm). Given a game G, an initial state Xinit, cm upper bound 17 G IN, and a threshold f G Q, 
decide if l^\ has a winning strategy 0\ G Ei for the objective Energy m {U ,Cm\i := 0) n AvgEnergyf). 

Illustration. Consider the one-player game in Fig. 3. The energy constraints force to keep the 
energy in [0, 3] at all times. Flence, only three strategies can be followed safely, respectively inducing 
plays Til, 712 and 713 . Due to the bounds on energy, it is natural that strategies need to alternate between 
both a positive and a negative cycle to satisfy objective EnergyLu{U,c\n\t •= 0 ) (since no simple zero 
cycle exists). It is yet interesting that to play optimally (play 713 ), actually has to use both positive 
cycles, and in the appropriate order (compare plays 712 and 713 ). This type of alternation is more intricate 
than for other classical objectives [11, 14, 32]. This gives a hint of the complexity of AElu games. 

4.1 Pseudo-polynomial algorithm and complexity bounds 

We hrst reduce the AEnj problem to the AE problem over a pseudo-polynomial expanded game, i.e., 
polynomial in the size of the original AEw game and in [/ G IN. By Thm. 6 and Thm. 4, this reduction 
induces NEXPTIME n coNEXPTIME-membership of the two-player AE^u problem, and EXPTIME- 
membership of the one-player one. We improve the complexity for two-player games by further reducing 
the AE game to an MP game. This yields EXPTIME-membership, which is optimal (Thm. 8). 
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Fig. 4: Reduction from the AEm game in Fig. 3a to an AE game and further reduction to an MP game 
over the same expanded graph. For the sake of succinctness, the weights are written as c | c' with c the 
weight used in the AE game and c' the one used in the MP game. We use the upper hound U = 3 and the 
average-energy threshold t = I (the optimal value in this case). The optimal play Tta = (acaab)^ of the 
original game corresponds to an optimal memoryless play in the expanded graph. 

Observe that if U is encoded in unary or if U is polynomial in the size of the original game, the 
complexity of the AEuj problem collapses to N P n coN P for two-player games and to P for one-player 
games thanks to our reduction to an AE problem and the results of Thm. 6 and Thm. 4. 

The reductions. Given a game G = {Si,S 2 ,E,w), an initial state ^init, an upper bound f/ G IN, and 
a threshold t G Q, we reduce the AEm problem to an AE problem as follows. If at any point along a 
play, the energy drops below zero or exceeds U, the play will be losing for the Energyiu{U,c\rt\t ■= 0) 
objective, hence also for its conjunction with the AE one. So we build a new game G' over the state space 
(N X {0, 1,... ,f/}) U {sink}. The idea is to include the energy level within the state labels, with sink as an 
absorbing state reached only when the energy constraint is breached. We now consider the AE problem 
for threshold t on G'. By putting a self-loop of weight 1 on sink, we ensure that if the energy constraint is 
not guaranteed in G, the answer to the AE problem in G' will be No as the average-energy will be infinite 
due to reaching this positive loop and repeating it forever. Hence, we show that the AElu objective can 
be won in G if and only if the AE one can be won in G' (thus avoiding the sink state). The result of the 
reduction for the game in Fig. 3a is presented in Fig. 4. 

We then show that the AE game G' can be further reduced to an MP game G" by modifying the weight 
structure of the graph. Essentially, all edges leaving a state (5,c) of G' are given weight c in G", i.e., the 
current energy level, and the self-loop on sink is given weight ([t] -f 1). This modification is depicted in 
Fig. 4. We claim that the AE problem for threshold t G Q in G' is equivalent to the MP problem for the 
same threshold in G". Indeed, we show that with our change of weight function, reaching sink implies 
losing, both in G' for AE and in G" for MP, and all plays that do not reach sink have the same value for 
their average-energy in G' as for their mean-payoff in G”. 

Illustration. Consider the AEiu game G in Fig. 3a. The optimal strategy is 713 = (acaab)^. Now 
consider the reduction to the AE game, and further to the MP game, depicted in Fig. 4. The opti¬ 
mal (memoryless) strategy in both the AE game G' and the MP game G" is to create the play n' = 
((a,0)(c, l)(a, l)(a,3)(f?,0))“, which corresponds to the optimal play 713 in the original game. It can be 
checked that =AEqi{k') =MP(jii{k'). 

Complexity. The reduction from the AElu game to the AE one induces a pseudo-polynomial blow-up 
in the number of states. Thanks to the second reduction and the use of a pseudo-polynomial algorithm for 
the MP game [33, 8], we get EXPTIM E-membership, which is optimal for two-player games thanks to 
the lower bound proved for EGlu [5]- 
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(j,i) once and repeating. to take [g,d) then make him lose by taking {a,b). 


Fig. 5: Families of games witnessing the need for pseudo-polynomial-memory strategies for EGlu (and 
AEiu) objectives. The goal of is to keep the energy in [0, U] at all times, for f/ G IN. The left game is 
won by and the right one by but both require memory polynomial in the value U to be won. 


Theorem 8. The AEiu problem is EXPT\ME-complete for two-player games and at least PSPACE-hard 
for one-player games. If the upper bound U is polynomial in the size of the game or encoded in unary, 

the AEiu problem collapses to NP n coNP and P for two-player and one-player games respectively. 

4.2 Memory requirements 

We prove pseudo-polynomial lower and upper bounds on memory for the two players in AElu games. 
The upper bound follows from the reduction to a pseudo-polynomial AE game and the memoryless 
determinacy of AE games proved in Thm. 5. The lower bound can be witnessed in two families of games 
asking for strategies using memory polynomial in the energy upper bound f/ G M to be won by 
(Fig. 5a) or (Fig. 5b) respectively. It is interesting to observe that those families already ask for such 
memory when considering the simpler EGu] objective. 

Theorem 9. Pseudo-polynomial-memory strategies are both sufficient and necessary to win in EGlu cmd 
AElu games with arbitrary energy upper bound f/ G IN, for both players. Polynomial memory suffices 
when U is polynomial in the size of the game or encoded in unary. 

5 Average-Energy with Lower-Bounded Energy 

We conclude with the conjunction of an AE objective with a lower bound (again equal to zero) constraint 
on the running energy, but no upper bound. This corresponds to an hypothetical unbounded energy storage. 
Hence, its applicability is limited, but it may prove interesting on the theoretical standpoint. 

Problem 3 (AEl). Given a game G, an initial state ^init and a threshold t G Q, decide if has a winning 
strategy 0 \ G L\for objective Energy l{c\„\i := 0) H AvgEnergyf). 

This problem proves to be challenging to solve: we provide partial answers in the following, with a 
proper algorithm for one-player games but only a correct but incomplete method for two-player games. 

Illustration. Consider the game in Fig. 3. Recall that for AElu with U = 3, the optimal play is 713 , 
and it requires alternation between all three different simple cycles. Now consider AEl. One may think 
that relaxing the objective would allow for simpler winning strategies. This is not the case. Some new 
plays are now acceptable w.r.t. the energy constraint, such as 714 = {aabaaba)^, with AE{Hi\) = 11/7 
and TTj = {aaababa)^^, with A£'(7r5) = 18/7. Yet, the optimal play w.r.t. the AS (under the lower-bound 
energy constraint) is still 713 , hence still requires to use all the available cycles, in the appropriate order. 
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5.1 One-player games 

We assume that the unique player is Indeed, the opposite case is easy as for ^ 2 , the ohjective is a 
disjunction and ^2 can choose beforehand which suh-ohjective he will transgress, and do so with a simple 
memory less strategy (both and EGl games admit memoryless optimal strategies as seen before). We 
show how to solve a one-player AEi problem in pseudo-polynomial time by reduction to an AEiu problem 
for a well-chosen upper bound 1/ G M and then application of the algorithm of Sect. 4.1. 

The reduction. Given a game G = {Si,S 2 = ®,E,w), an initial state ^init, and a threshold f G Q, we 
reduce the AEi problem to an AEuj problem with an upper bound 1/ G IN which is pseudo-polynomial 
in the original problem. Precisely, U '.= t -\-N^ +N^, with N = W ■ (INI -|-2). The intuition is that if 
can win a one-player AEi^ game, he can win it without ever reaching energy levels higher than the chosen 
bound 17, even if he is technically allowed to do so. Essentially, the interest of increasing the energy is 
making more cycles available (as they become safe to take w.r.t. the lower bound constraint), but increasing 
the energy further than necessary is not a good idea as it will negatively impact the average-energy. To 
prove this reduction, we start from an arbitrary winning path in the AE^ game, and build a witness path 
that is still winning for the AE/, objective, but also keeps the energy below 17 at all times. Our construction 
exploits a result of Lafourcade et al. that bounds the value of the counter along a path in a one-counter 
automaton [28]. We build upon it to define an appropriate transformation leading to the witness path and 
derive a sufficiently large upper bound 17 G IN for the AEj^u problem. 

Complexity. Plugging this bound U in the pseudo-polynomial-time algorithm for AEiu games yields 
an algorithm for one-player AE^ games that is overall also pseudo-polynomial. We prove that no truly- 
polynomial-time algorithm can be obtained unless P = NP as the one-player AE^ problem is NP-hard. 
We show it by reduction from the subset-sum problem [18]. 

Memory requirements. Recall that for t^ 2 , the situation is simpler and memoryless strategies suffice. 
By the reduction to AEiu, we know that pseudo-polynomial memory suffices for This bound is 
tight as witnessed by the family of games already presented in Fig. 5a. To ensure the lower bound on 
energy, has to play edge {s,s'^ at least E times before taking the {s,s) self-loop. But to minimize the 
average-energy, edge {s,s') should never be played more than necessary. The optimal strategy is the same 
as for the AEiu problem: playing ( 5 ,/) exactly U times, then {s,s) once, then repeating, forever. 

Theorem 10. Pseudo-polynomial-memory strategies are both sufficient and necessary to win for in 
one-player AEi games. Memoryless strategies suffice for ^2 in such games. 

5.2 Two-player games 

Decidability. Assume that there exists some 17 G IN such that 3^\ has a winning strategy for the AEiu 
problem with upper bound 17 and average-energy threshold t. Then, this strategy is trivially winning for 
the AEi problem as well. This observation leads to an incremental algorithm that is correct (no false 
positives) but incomplete (it is not guaranteed to stop). In [6], we draw the outline of a potential approach 
to obtain completeness hence decidability. 

Lemma 11. There is an algorithm that takes as input an AEi problem and iteratively solves corresponding 
AEiu problems for incremental values ofU G IN. If a winning strategy is found for some 17 G IN, then it is 
also winning for the original AEi problem. If no strategy is found up to value 17 G M, then no strategy 
of \ can simultaneously win the AEi problem and prevent the energy from exceeding U at all times. 

While an incomplete algorithm clearly seems limiting from a theoretical standpoint, it is worth noting 
that in practice, such approaches are common and often necessary restrictions, even for problems where a 
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Fig. 6: Reduction from a countdown game ^ = {y with initial configuration (vinit,co) to a two-player 
AEi problem for average-energy threshold t := 0. 

complete algorithm is known to exist — because theoretical bounds granting completeness are too large 
to be tackled efficiently by software synthesis tools (e.g., [14]). In our case, we have already seen that if 
such a bound exists for the two-player AEi problem, it needs to be at least exponential in the encoding of 
problem (cf. one-player A^z, games). Hence it seems likely that a prohibitive bound would be necessary, 
rendering the incremental algorithm of Lem. 11 more appealing in practice. 

Complexity lower bound. We now prove that the two-player AEl problem would require at least 
exponential time to solve. Our proof is by reduction from countdown games. A countdown game ^ is a 
weighted graph {Y, S’), where Y is the finite set of states, and S C Y {0} xY is the edge relation. 
Configurations are of the form (v, c),v ^Y, c ^ IN. The game starts in an initial configuration (vjnit, cq) and 
transitions from a configuration {s,c) are performed as follows. First, chooses a duration d, 0 < d < c 
such that there exists e = {v,d,v') G S for some v' G Y. Second, 0^2 chooses a state v' ^Y such that 
e = {v,d,V) G S. Then the game advances to (y',c — d). Terminal configurations are reached whenever 
no legitimate move is available. If such a configuration is of the form (v,0), tYi wins the play, otherwise 
^3^2 wins. Deciding the winner given an initial configuration (vinit,co) is EXPTIME-complete [25]. 

Our reduction is depicted in Fig. 6. The EL is initialized to cq, then it is decreasing along any play. 
Consider the AEi objective for AE threshold t := 0. To ensure that the energy always stays non-negative, 
has to switch to stop while the EL is no less than zero. In addition, to ensure an AE no more than 
t = 0, has to obtain an EL at most equal to zero before switching to stop (as the AE will be equal to 
this EL thanks to Lem. 1 and the zero self-loop on stop). Hence, wins the AEi^ objective only if he can 
ensure a total sum of chosen durations that is exactly equal to cq, i.e., if he can reach a winning terminal 
configuration for the countdown game. The converse also holds. 

Lemma 12. The AEi problem is EXPTIME-/zard/or two-player games. 

Memory requirements. We establish that memory is needed for both players. 

Lemma 13. Pseudo-polynomial-memory strategies are necessary to win for in two-player AEi games. 
Memory is also required for 1^2 in such games. 
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